YouTube Link: https://youtu.be/LKBlnTX6nj0
Nowadays, the rising internet-supported peer-to-peer platforms, such as Airbnb, have raised the interests of researchers and the alerts of governors due to their impacts to traditional real estate markets(Einav, Farronato, & Levin, 2016). On the demand side, assuming that tourists are different from residents as they only require short-term accommodation rather than long-term rentals or mortgages. As for supply, the short-term rentals (intended to meet tourist demand) are more profitable than long-term rentals (for residents), causing property owners to redirect their supply towards the tourist market, eventually leading to the raising rental prices and sometimes displacing residents from city centers to the outskirts (Beatriz & Iis, 2020).
In this project, we will use the Airbnb data set of Amsterdam to predict apartment prices and available days based on apartment characteristics and other data at the neighborhood level. Based on the prediction results of the OLS models, compare the short-term rental income of Airbnb and the long-term rental income of real estate.
Obviously, hosts who need some data to support their decision-making on whether to put their real estates to Airbnb as short-term rentals or to the local rental market as long-term rentals are the direct beneficiaries of this prediction. For some professional real estate agents or housing managers, they will also get data support from this prediction result. They will recognize which areas are more worthy of investment, or which real estate market is more suitable for income, so that they can more rationally allocate the properties they manage and increase their income. The Airbnb platform should also be interested in this prediction result if they tend to attract more hosts, and of course they may also do similar data analysis internally. If these prediction models can provide solid evidence that Airbnb rentals are more profitable than the local long-term rentals, Airbnb would be happy to present these results to attract more hosts, at least on their hosts‘ webpage.
There are some websites that provide Airbnb price information to help hosts with decision-making support, but there is no website that compares Airbnb rents with local long-term rental market prices. Moreover, our model also provides a prediction of the number of days available for rent within a month. Although the reliability of this prediction is yet to be studied due to the problem of the data set, it provides another way of thinking to a certain extent.
Amsterdam is broken up into 8 districts or boroughs (Centrum, Zuid, West, Oost, Noord, Nieuw-West, Zuidoost, Westpoort), which are further divided into neighborhoods.
Our final goal is to compare the short-term and long-term rental prices. We obtain a data set with the average rent prices of district level as the long-term rental prices, as for the short-term rental price, we will multiply the price per day and the number of days available per month.
Our first step is to compile the data that we need into one dataset. This will include the two dependent variables, prices and available days in 30 days of each Airbnb apartment in Amsterdam, and the features needed to predict these two outcomes.
We will investigate both the underlying spatial process in the outcome of interest as well as trends and correlations between the outcome and the predictive features by scatter plots.
We will conduct feature engineering work in three aspects - 1) reclassify some variables from listing Airbnb data; 2) measure exposure distance to public services/(dis)amenities 3) test analysis to check whether the reviews of an apartment contain certain words or not.
Although we have nearly a hundred features to choose from in this project, we will limit the number of features actually used by the model within 20.
We will conduct prices and available days prediction in two OLS models respectively.
House price prediction has been a common use case in cities that use data to assess property taxes. The hedonic model is a theoretical framework for predicting home prices by breaking down house prices into the value of their constituent parts, such as the presence of a pool or the amount of local crime.
For our purpose, Airbnb apartment prices and available days can be deconstructed into three constituent parts - 1) physical characteristics, like whether the apartment provide Wi-Fi, TV or not; 2) public services/(dis)amenities, such as the distance to transit stations, the distance to historical architectures; 3) reviews text analysis, whether some specific words are included in the listing’s review text. However, we omit the spatial process of our dependent variables, namely how dependent variables cluster at the neighborhood, districts and city scales, when developing regression models.
We obtain data from Airbnb listings in Amsterdam from Dec 05 2021 to Sep 07 2022 conducted by InsideAirbnb.com, who obtain and provide data to the public for research purposes. Quarterly Airbnb data ( Dec 05 2021, Mar 08 2022, Jun 05 2022, Sep 07 2022 ) for the last year will be used in our research.
ggplot() + geom_sf(data = neighbourhoods_geo, fill = "grey40") +
stat_density2d(data = data.frame(st_coordinates(airbnb_geo)),
aes(X, Y, fill = ..level.., alpha = ..level..),
size = 0.01, bins = 40, geom = 'polygon') +
scale_fill_gradient(low = "#25CB10", high = "#FA7800",
breaks=c(0.000000003,0.00000003),
labels=c("Minimum","Maximum"), name = "Density") +
scale_alpha(range = c(0.00, 0.35), guide = FALSE) +
labs(title = "Density of Short Term Housing, Amsterdam" , subtitle = "Map 3-1") +
mapTheme()
Airbnb listing Density map shows that there were alot of short-term housing located in the center of Amsterdam, namely, the West, Centrum, and Zuid districts.
We also download external data from Maps Data City Amsterdam (https://maps.amsterdam.nl/open_geodata/?LANG=en), who provides official construction or social data of city Amsterdam. We obtain housing stock, mean rent price and population data from this website.
new_neighbor <- neighbourhoods_geo
housing_stock <- c(7432,7432,8208,4763,544, 8208,751,7432, 1831, 574, NA, 979, 3465, 1056,8298,1904,
NA, NA, 1869, 2113, 294, 5192
)
new_neighbor <- new_neighbor %>%
cbind(new_neighbor, housing_stock)%>%
dplyr::select(neighbourhood,housing_stock, geometry)
#new_neighbor
ggplot(new_neighbor, aes(x=neighbourhood, y=housing_stock, fill=neighbourhood))+
geom_bar(stat="identity", color="black")+
scale_fill_manual(values=palette22)+
theme(text = element_text(size = 5),element_line(size =0.5))+
labs(title = "Long Term Housing Stock in Each Neighbourhood", subtitle = "Graph 3-2")
new_neighborP <- neighbourhoods_geo
housing_price <- c(566,566,529,574,NA, 529,NA,566, 566, NA, NA, NA, 538, NA,560,NA,
529, NA, NA,NA, NA, NA
)
new_neighborP <- new_neighborP %>%
cbind(new_neighborP, housing_price)%>%
dplyr::select(neighbourhood,housing_price, geometry)
#new_neighbor
ggplot(new_neighbor, aes(x=neighbourhood, y=housing_price, fill=neighbourhood))+
geom_bar(stat="identity", color="black")+
scale_fill_manual(values=c("#00988e", "#008f8c", "#008689", "#007d85", "#077480", "#146b79", "#1d6272", "#23596a", "#275061", "#2a4858"))+
theme(text = element_text(size = 5),element_line(size =0.5))+
labs(title = "Long Term Housing Rent in Each Neighbourhood" , subtitle = "Graph 3-3")
colony_10km <- st_buffer(trans_stops, 390)
ggplot() +
geom_sf(data = neighbourhoods_geo) +
geom_sf(data = airbnb_geo, color = '#A5ABC2')+
geom_sf(data = colony_10km, color = 'red',fill=NA)+
labs(title = "Airbnb Listings and Transit Stations" , subtitle = "Map 3-4") +
mapTheme()
The transit stations map shows that most of the stations are located in central Amsterdam, based on this information, we assume that the distance of each Airbnb listings to the nearest station might play a huge role in terms of setting the price.
#airbnb_geo[ c(15,25,28,29, 31, 32:40, 42:45, 47:49, 52:58, 61:65)] <- sapply(airbnb_geo[c(15,25,28,29, 31, 32:39, 42:45, 47:49, 52:58, 61:65)], as.numeric)
#airbnb_geo[ c(host_listings_count, host_total_listings_count,accommodates,bedrooms,beds, price, 32:39, 42:45, 47:49, 52:58, 61:65)] <- sapply(airbnb_geo[c(15,25,28,29, 31, 32:40, 42:45, 47:49, 52:58, 61:65)], as.numeric, na.rm = T)
airbnb_geo <- airbnb_geo %>% #nearest neighbor distance
mutate(
landmark_nn1 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(historical)), 1),
landmark_nn2 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(historical)), 2),
landmark_nn3 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(historical)), 3),
landmark_nn4 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(historical)), 4),
landmark_nn5 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(historical)), 5))
airbnb_geo <- airbnb_geo %>% #nearest neighbor distance
mutate(
trans_nn1 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(trans_stops)), 1),
trans_nn2 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(trans_stops)), 2),
trans_nn3 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(trans_stops)), 3),
trans_nn4 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(trans_stops)), 4),
trans_nn5 = nn_function(st_coordinates(st_centroid(airbnb_geo)), st_coordinates(st_centroid(trans_stops)), 5))
#historical
numeric_listings <- airbnb_geo %>%
dplyr::select(-host_id,-id,-geometry)
numericVars <-
select_if(st_drop_geometry(numeric_listings), is.numeric) %>% na.omit()
ggcorrplot(
round(cor(numericVars), 1),
p.mat = cor_pmat(numericVars),
colors = c("#25CB10", "white", "#FA7800"),
lab_size = 1,
tl.cex = 5,
type="lower",
insig = c("pch", "blank"), pch = 1, pch.col = "black", pch.cex =1) +
labs(title = "Correlation across numeric variables")+
theme(text = element_text(size = 5),element_line(size =0.5))
Based the correlation matrix, we observed that there are a lot of highly correlated variables in the data set, such as trans_nn, landmark_nn, availability_30, availability_60, etc. When we were building the regression models
ggplot(airbnb_geo, aes(x=number_of_reviews, y=price)) +
geom_point()+
geom_smooth(method=lm, colour="red") +
labs( title = 'Price as a Function of Number of Reviews',subtitle = sprintf('correlation = %s',round(cor(airbnb_geo$number_of_reviews, airbnb_geo$price), 2)), caption = 'Figure 4-1')+
plotTheme()
ggplot(airbnb_geo, aes(x=review_scores_value, y=price)) +
geom_point()+
geom_smooth(method=lm, colour="red") +
labs( title = 'Price as a Function of Review Scores',subtitle = sprintf('correlation = %s',round(cor(airbnb_geo$number_of_reviews, airbnb_geo$price), 2)), caption = 'Figure 4-2')+
plotTheme()
We were interested in the effects of the reviews on the listings price; however, both number of reviews, and review scores don’t show a strong correlation between our dependent variable (price).
In this section, we created 6 dummy variables to help us improve the regression accuracy, these variables are wifi, tv, private, dryer, quite, and clean. The reason that the first four variables were picked is we thought that these are the most important features that people are looking for. We created these features by searching the listing description column to see whether these key words are mentioned, then gave a value of 1 if present, 0 otherwise. Similarly, quite and clean are the two most frequent words appeared in the description column.
Reviews of former customers may be important information when someone decides which apartment in Airbnb should be chosen. As for this reason, we believe that the reviews text might have correlation with apartments’ price in some instances.
review_scores <- airbnb %>%
mutate_if(is.character,as.numeric)%>%
dplyr::select("id","review_scores_rating","review_scores_accuracy","review_scores_cleanliness","review_scores_checkin","review_scores_communication","review_scores_location", "review_scores_value") %>%
st_drop_geometry(.) %>%
na.omit()
review_scores.id <-
review_scores %>%
group_by(id) %>%
summarise_at(c("review_scores_rating", "review_scores_accuracy", "review_scores_cleanliness", "review_scores_checkin", "review_scores_communication", "review_scores_location", "review_scores_value"), mean, na.rm = TRUE)
id.nb <-
subset(airbnb_geo[c("id", "neighbourhood_cleansed", "geometry")])%>%
group_by(id)
review_scores.nb <-
merge(id.nb, review_scores.id, by = "id")%>%
group_by(neighbourhood_cleansed)%>%
summarise_at(c("review_scores_rating", "review_scores_accuracy", "review_scores_cleanliness", "review_scores_checkin", "review_scores_communication", "review_scores_location", "review_scores_value"), mean, na.rm = TRUE)%>%
st_drop_geometry()
review_scores.nb <- merge(x = review_scores.nb, y = nb, by.x = "neighbourhood_cleansed", by.y = "neighbourhood") %>%st_as_sf()
ggplot() +
# geom_sf(data = nb, color = "#767E8E", fill = "transparent")
geom_sf(data = review_scores.nb, aes(fill = q5(review_scores_rating)))+
scale_fill_manual(values = palette5,
labels = qBr(review_scores.nb, "review_scores_rating"),
name = "Rating\n(Quintile Breaks)") +
labs(title = "Airbnb Review Rating in Each Neighborhood",
subtitle = "Map 5-1")
First, we will do a K-means clustering on the neighborhood to
classify them according to the review scores of each apartment. There
are 7 dimensions of reviews of each apartment, including rating,
accuracy, cleanliness, checkin, communication, location, value, they are
selected by review_scores.nb[c(2:8)].
data_scaled<- scale(review_scores.nb[c(2:8)]%>%st_drop_geometry())
distance <- get_dist(data_scaled)
fviz_dist(distance, gradient = list(low = "#00AFBB", mid = "white", high = "#EB4C60"))
set.seed(123)
k2 <- kmeans(data_scaled, centers = 2, nstart = 25)
k3 <- kmeans(data_scaled, centers = 3, nstart = 25)
k4 <- kmeans(data_scaled, centers = 4, nstart = 25)
k5 <- kmeans(data_scaled, centers = 5, nstart = 25)
p1 <- fviz_cluster(k2, geom = "point", data = data_scaled, labelsize = 1, ellipse.type = "convex", ellipse.alpha = 0 ) + ggtitle("k = 2") +
theme(axis.line = element_line(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank())
p2 <- fviz_cluster(k3, geom = "point", data = data_scaled, labelsize = 1, ellipse.type = "convex", ellipse.alpha = 0 ) + ggtitle("k = 3") +
theme(axis.line = element_line(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank())
p3 <- fviz_cluster(k4, geom = "point", data = data_scaled, labelsize = 1, ellipse.type = "convex", ellipse.alpha = 0 ) + ggtitle("k = 4") +
theme(axis.line = element_line(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank())
p4 <- fviz_cluster(k5, geom = "point", data = data_scaled, labelsize = 1, ellipse.type = "convex", ellipse.alpha = 0 ) + ggtitle("k = 5") +
theme(axis.line = element_line(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank())
grid.arrange(p1, p2, p3, p4, nrow = 2)
The clustering with our chosen variables goes as follows: We begin by scaling, or normalizing, the data, which places everything on a scale with a mean of 0 and a standard deviation of 1. All variables in the algorithm must be measured on the same scale in order to be given equal weight in the next step, which is to calculate the Euclidean distance between each census tract for the variables. The fviz function can be used to visualize this distance matrix.
To maximize the difference between all the groups and that minimize the difference in observations within the groups, we choose the cluster 4 and examine the characteristics of this solution as below.
cltclusters<- review_scores.nb %>%
mutate(cluster4 = k4$cluster) %>%
group_by(cluster4) %>%
summarise_all("mean") %>%
select(-c("neighbourhood_cleansed"))
kable(x=cltclusters)%>%kable_minimal()
| cluster4 | review_scores_rating | review_scores_accuracy | review_scores_cleanliness | review_scores_checkin | review_scores_communication | review_scores_location | review_scores_value | neighbourhood_group | geometry |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 4.753112 | 4.792388 | 4.701025 | 4.851735 | 4.839778 | 4.647258 | 4.619955 | NA | MULTIPOLYGON (((4.839426 52… |
| 2 | 4.818214 | 4.847455 | 4.761887 | 4.880969 | 4.894309 | 4.809295 | 4.651767 | NA | POLYGON ((4.848885 52.35791… |
| 3 | 4.834702 | 4.859731 | 4.767027 | 4.903170 | 4.913189 | 4.679830 | 4.693334 | NA | MULTIPOLYGON (((4.991614 52… |
| 4 | 4.813629 | 4.846609 | 4.778137 | 4.879906 | 4.865706 | 4.601134 | 4.672704 | NA | POLYGON ((4.899007 52.33071… |
We than join the cluster assignment into the neighborhood in Map 5-2.
cltdata <- review_scores.nb %>%
mutate(cluster4 = k4$cluster) %>%st_as_sf()
ggplot() +
# geom_sf(data = nb, color = "#767E8E", fill = "transparent")
geom_sf(data = cltdata, aes(fill = q5(cluster4)))+
scale_fill_manual(values = palette5,
labels = qBr(cltdata, "cluster4"),
name = "Cluster 4\n(Quintile Breaks)") +
labs(title = "Airbnb Cluster in Each Neighborhood", subtitle = "Map 5-2")
Here we begin our text analysis by data cleaning. Firstly, we join all review text into one data set, then trim and transfer text to lowercase. Removing punctuation is the second step, followed by removing stop words in English, French and Spanish. Finally, we count words frequency after then removing words occurred less than 5 times.
Our data set has a great number of reviews ( about 1.2 million rows ) and we calculated the top mentioned words in figure below after cleaning words. Obviously, some words are highly mentioned in reviews, such as “time”, “location”, “clean”.
set.seed(12345)
wordcloud2(data=words, size=1.6, color='random-dark')
words %>%
filter(nn >= 5000) %>%
arrange(nn) %>%
# group_by(d) %>%
top_n(25, nn) %>%
ungroup() %>%
mutate(n = factor(word, unique(word))) %>%
ggplot(aes(word, nn)) +
geom_col(show.legend = FALSE) +
# facet_wrap(~ d, scales = "free", ncol = 3) +
coord_flip() +
labs(x = NULL,
y = "Words counts",
title = "Review text top mentioned words count",
subtitle = "Figure 5-1")
To examine if the top mentioned words have correlation with price, we extracted a sample data set with 1000 words from the whole reviews data set and created a new column called “clean”, for example, to check whether the reviews contain the word “clean” or not recorded as 1,0. Then, plots the mean price of apartments in two groups, with or without “clean” mentioned in their reviews.
w = "clean"
churn <-
dat %>%
mutate(binary = ifelse(str_detect(dat$comments, w) == TRUE, 1, 0))
churn <-
merge(x = churn, y = dec_price.id, by.x = "listing_id", by.y = "id", all.x = TRUE) %>%
na.omit()
churn %>%
dplyr::select(price, binary) %>%
gather(Variable, value, price) %>%
ggplot(aes(binary, value, fill=binary)) +
geom_bar(position = "dodge", stat = "summary", fun = "mean") +
# facet_wrap(~Variable, scales = "free") +
# scale_fill_manual(values = palette2) +
labs(x=w, y="Mean",
title = "Prices associations with whether contain clean in reviews",
subtitle = "Figure 5-2") +
theme(legend.position = "none")
From Figure 5-2, we can tell that whether the reviews of an apartment contain “clean” or not, their prices may differ slightly. Will add those words that have relationship with prices in our regression model as dummy variables.
#split the dataset
#airbnb_geo <- airbnb_geo%>%
#filter(!(property_type %in% c("Cave"," Entire home/apt", "Shared room in bed and breakfast", "Shared room in boat")) )
airbnb_geo <-airbnb_geo[-c(4598, 10294, 15647, 21832, 4876, 10565, 15907 ,22087, 338 , 6063 ,11617, 17743,3192 , 8927, 14385, 20537,4265, 9976 ,15357, 21539 ),]
set.seed(3456)
inTrain <- createDataPartition(
y = paste( airbnb_geo$neighbourhood_group, airbnb_geo$neighbourhood_group_cleansed, airbnb_geo$property_type, airbnb_geo$bathrooms_text
),
p = .60, list = FALSE)
listing_training <- airbnb_geo[inTrain,]
listing_test <- airbnb_geo[-inTrain,]
listing_test_new <- listing_test # Duplicate test data set
listing_test_new$property_type[which(!(listing_test_new$property_type %in% unique(listing_training$property_type)))] <- NA # Replace new levels by NA
#data_test_new
#factor property_type has new levels Cave, Entire home/apt, Shared room in bed and breakfast, Shared room in boa
reg.1 <- lm(price ~ ., data = as.data.frame(listing_training) %>%
dplyr::select( price,host_response_rate, host_acceptance_rate, host_listings_count, host_total_listings_count, neighbourhood_cleansed, property_type, room_type, accommodates, bathrooms_text, minimum_nights, maximum_nights, availability_30, number_of_reviews, review_scores_rating, review_scores_accuracy, reviews_per_month))
Based on our goal for this project, we believed it is necessary to include the following parameters: distance to landmarks, amenity features, host information. The model we built is an OLS regression model, which uses the price column as the dependent variable, and the model is refined by the training set.
reg.2 <- lm(price ~ ., data = as.data.frame(listing_training) %>%
dplyr::select( price, host_response_rate, host_acceptance_rate, host_listings_count, host_total_listings_count, neighbourhood_cleansed, room_type, accommodates,property_type, bathrooms_text, minimum_nights, maximum_nights, availability_30, number_of_reviews, review_scores_rating, reviews_per_month, host_has_profile_pic, host_identity_verified, beds,bedrooms
,landmark_nn3,landmark_nn4,trans_nn3,trans_nn5,wifi, tv, private , dryer, quite, clean
))
stargazer(reg.2 ,type = "text",
title = "Summary Statistics of Airbnb Price Prediction Model ",
header = FALSE,
single.row = TRUE)
##
## Summary Statistics of Airbnb Price Prediction Model
## ========================================================================================
## Dependent variable:
## ---------------------------
## price
## ----------------------------------------------------------------------------------------
## host_response_rate -21.519*** (7.316)
## host_acceptance_rate 10.887*** (4.157)
## host_listings_count -0.219*** (0.084)
## host_total_listings_count 0.239*** (0.075)
## neighbourhood_cleansedBijlmer-Oost 15.290 (15.817)
## neighbourhood_cleansedBos en Lommer -2.823 (11.324)
## neighbourhood_cleansedBuitenveldert - Zuidas -10.416 (12.258)
## neighbourhood_cleansedCentrum-Oost 36.638*** (12.755)
## neighbourhood_cleansedCentrum-West 40.139*** (12.723)
## neighbourhood_cleansedDe Aker - Nieuw Sloten 56.543*** (13.117)
## neighbourhood_cleansedDe Baarsjes - Oud-West 20.463* (12.247)
## neighbourhood_cleansedDe Pijp - Rivierenbuurt 19.410 (12.140)
## neighbourhood_cleansedGaasperdam - Driemond 27.635** (12.676)
## neighbourhood_cleansedGeuzenveld - Slotermeer 28.435** (11.969)
## neighbourhood_cleansedIJburg - Zeeburgereiland 20.469* (11.445)
## neighbourhood_cleansedNoord-Oost -13.847 (13.759)
## neighbourhood_cleansedNoord-West -48.598*** (14.661)
## neighbourhood_cleansedOostelijk Havengebied - Indische Buurt -8.499 (12.776)
## neighbourhood_cleansedOsdorp 36.927*** (14.138)
## neighbourhood_cleansedOud-Noord -6.789 (13.072)
## neighbourhood_cleansedOud-Oost -1.048 (12.637)
## neighbourhood_cleansedSlotervaart 9.956 (12.940)
## neighbourhood_cleansedWatergraafsmeer -9.156 (11.693)
## neighbourhood_cleansedWesterpark 3.482 (12.701)
## neighbourhood_cleansedZuid 30.115** (12.356)
## room_typeHotel room -178.730*** (35.165)
## room_typePrivate room -166.992*** (29.845)
## room_typeShared room -60.391 (61.789)
## accommodates 29.027*** (1.305)
## property_typeBoat 3.069 (34.991)
## property_typeCamper/RV -109.671 (96.394)
## property_typeCasa particular 28.965 (71.354)
## property_typeEntire bungalow -18.847 (49.306)
## property_typeEntire cabin -22.404 (43.899)
## property_typeEntire chalet -7.072 (41.792)
## property_typeEntire condo 49.097 (34.162)
## property_typeEntire condominium (condo) -7.307 (34.783)
## property_typeEntire cottage 31.370 (40.603)
## property_typeEntire guest suite -17.713 (35.731)
## property_typeEntire guesthouse 11.470 (35.798)
## property_typeEntire home 0.679 (34.144)
## property_typeEntire loft 64.764* (34.479)
## property_typeEntire place -31.151 (42.338)
## property_typeEntire rental unit -1.464 (33.994)
## property_typeEntire residential home -48.800 (34.645)
## property_typeEntire serviced apartment 42.679 (34.556)
## property_typeEntire townhouse -8.662 (34.451)
## property_typeEntire vacation home 27.593 (42.431)
## property_typeEntire villa 55.356 (37.793)
## property_typeFarm stay -19.524 (44.127)
## property_typeHouseboat 41.218 (34.604)
## property_typePrivate room 116.454** (49.348)
## property_typePrivate room in barn 9.533 (99.304)
## property_typePrivate room in bed and breakfast 125.482*** (45.403)
## property_typePrivate room in boat 110.052** (45.990)
## property_typePrivate room in bungalow 154.773** (68.336)
## property_typePrivate room in cabin 132.421** (59.977)
## property_typePrivate room in casa particular 101.250** (49.940)
## property_typePrivate room in condo 138.354*** (46.104)
## property_typePrivate room in condominium (condo) 95.841** (47.148)
## property_typePrivate room in earthen home 151.750** (77.143)
## property_typePrivate room in farm stay 146.688*** (49.378)
## property_typePrivate room in guest suite 123.290*** (45.749)
## property_typePrivate room in guesthouse 139.173*** (49.688)
## property_typePrivate room in home 153.628*** (45.691)
## property_typePrivate room in hostel 144.329*** (49.214)
## property_typePrivate room in houseboat 125.286*** (45.672)
## property_typePrivate room in loft 103.473** (46.335)
## property_typePrivate room in nature lodge 474.245*** (77.451)
## property_typePrivate room in rental unit 108.490** (45.435)
## property_typePrivate room in residential home 96.483** (45.980)
## property_typePrivate room in serviced apartment 246.638*** (48.656)
## property_typePrivate room in tiny home 129.505** (59.823)
## property_typePrivate room in tiny house 112.773* (62.996)
## property_typePrivate room in townhouse 123.552*** (45.663)
## property_typePrivate room in villa 155.870*** (51.790)
## property_typeRoom in aparthotel 136.175*** (39.660)
## property_typeRoom in bed and breakfast 147.678*** (50.310)
## property_typeRoom in boutique hotel 176.837*** (45.686)
## property_typeRoom in hostel 138.154*** (51.918)
## property_typeRoom in hotel 170.102*** (45.862)
## property_typeRoom in serviced apartment 290.827*** (51.064)
## property_typeShared room in bed and breakfast 28.941 (83.324)
## property_typeShared room in home 7.163 (80.666)
## property_typeShared room in hostel 1.358 (55.252)
## property_typeShared room in houseboat 119.852* (61.380)
## property_typeShared room in rental unit -0.721 (58.525)
## property_typeShared room in residential home
## property_typeTent -9.205 (77.093)
## property_typeTiny home 16.573 (41.908)
## property_typeTiny house 1.790 (70.982)
## property_typeTower 198.824*** (47.775)
## property_typeWindmill 129.586* (71.821)
## property_typeYurt -34.042 (94.966)
## bathrooms_text0 baths -44.543 (30.172)
## bathrooms_text0 shared baths -34.846 (31.176)
## bathrooms_text1 bath -11.523 (24.097)
## bathrooms_text1 private bath -7.950 (23.859)
## bathrooms_text1 shared bath -16.907 (24.189)
## bathrooms_text1.5 baths -2.863 (24.081)
## bathrooms_text1.5 shared baths -16.854 (24.316)
## bathrooms_text2 baths 37.375 (24.408)
## bathrooms_text2 shared baths -4.818 (32.255)
## bathrooms_text2.5 baths 83.452*** (24.920)
## bathrooms_text2.5 shared baths 294.456*** (95.721)
## bathrooms_text3 baths 108.671*** (25.655)
## bathrooms_text3 shared baths -48.705 (39.929)
## bathrooms_text3.5 baths 84.128*** (29.347)
## bathrooms_text3.5 shared baths 47.428 (47.496)
## bathrooms_text4 baths 20.972 (44.329)
## bathrooms_text4.5 baths 277.202*** (43.751)
## bathrooms_text5 baths 47.393 (38.285)
## bathrooms_text5.5 baths 135.792** (58.737)
## bathrooms_textHalf-bath -31.039 (37.636)
## bathrooms_textPrivate half-bath -31.607 (35.474)
## bathrooms_textShared half-bath -99.405*** (28.684)
## minimum_nights 0.025 (0.033)
## maximum_nights -0.003* (0.002)
## availability_30 2.028*** (0.133)
## number_of_reviews -0.078*** (0.013)
## review_scores_rating 14.760*** (3.265)
## reviews_per_month -0.717 (0.490)
## host_has_profile_pict -42.579*** (15.313)
## host_identity_verifiedt 6.871** (2.763)
## beds -7.800*** (1.206)
## bedrooms 28.168*** (1.985)
## landmark_nn3 5,193.457*** (1,747.448)
## landmark_nn4 -6,136.821*** (1,697.320)
## trans_nn3 2,411.419 (1,555.101)
## trans_nn5 -2,219.012 (1,670.408)
## wifi 15.326*** (3.285)
## tv 5.034* (2.574)
## private -5.551** (2.308)
## dryer -1.852 (3.825)
## quite 1.636 (5.590)
## clean 1.156 (3.215)
## Constant 67.164 (49.536)
## ----------------------------------------------------------------------------------------
## Observations 8,866
## R2 0.512
## Adjusted R2 0.505
## Residual Std. Error 88.185 (df = 8730)
## F Statistic 67.913*** (df = 135; 8730)
## ========================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
#property_type
#bathrooms_text,
To get the availability of each Airbnb listing, the OLS model we built consists variables that associate with the reviews of the listing, host information, room features. Finally, the model is refined by the training set.
reg_demand1 <- lm(availability_30 ~ ., data = as.data.frame(listing_training) %>%
dplyr::select( availability_30, host_response_rate, host_acceptance_rate, host_listings_count, host_total_listings_count, neighbourhood_cleansed, room_type, accommodates,property_type, bathrooms_text, minimum_nights, maximum_nights, price, number_of_reviews, review_scores_rating, reviews_per_month, host_has_profile_pic, host_identity_verified, beds,bedrooms,landmark_nn3,landmark_nn4,trans_nn3,trans_nn5,wifi, tv, private , dryer, quite, clean,availability_90 ))
stargazer(reg_demand1 ,type = "text",
title = "Summary Statistics of Model Airbnb Price Availability Model ",
header = FALSE,
single.row = TRUE)
##
## Summary Statistics of Model Airbnb Price Availability Model
## ========================================================================================
## Dependent variable:
## ---------------------------
## availability_30
## ----------------------------------------------------------------------------------------
## host_response_rate -0.296 (0.353)
## host_acceptance_rate 0.506** (0.201)
## host_listings_count 0.007* (0.004)
## host_total_listings_count -0.005 (0.004)
## neighbourhood_cleansedBijlmer-Oost 1.572** (0.762)
## neighbourhood_cleansedBos en Lommer 1.121** (0.546)
## neighbourhood_cleansedBuitenveldert - Zuidas 0.498 (0.591)
## neighbourhood_cleansedCentrum-Oost 1.369** (0.615)
## neighbourhood_cleansedCentrum-West 1.431** (0.614)
## neighbourhood_cleansedDe Aker - Nieuw Sloten -0.728 (0.633)
## neighbourhood_cleansedDe Baarsjes - Oud-West 1.288** (0.590)
## neighbourhood_cleansedDe Pijp - Rivierenbuurt 1.269** (0.585)
## neighbourhood_cleansedGaasperdam - Driemond -0.522 (0.611)
## neighbourhood_cleansedGeuzenveld - Slotermeer 0.374 (0.577)
## neighbourhood_cleansedIJburg - Zeeburgereiland -0.181 (0.552)
## neighbourhood_cleansedNoord-Oost 1.881*** (0.663)
## neighbourhood_cleansedNoord-West 1.441** (0.707)
## neighbourhood_cleansedOostelijk Havengebied - Indische Buurt 1.627*** (0.616)
## neighbourhood_cleansedOsdorp -0.760 (0.682)
## neighbourhood_cleansedOud-Noord 1.765*** (0.630)
## neighbourhood_cleansedOud-Oost 1.245** (0.609)
## neighbourhood_cleansedSlotervaart 1.064* (0.624)
## neighbourhood_cleansedWatergraafsmeer 1.299** (0.564)
## neighbourhood_cleansedWesterpark 1.280** (0.612)
## neighbourhood_cleansedZuid 0.964 (0.596)
## room_typeHotel room 2.208 (1.697)
## room_typePrivate room 2.690* (1.441)
## room_typeShared room -2.798 (2.978)
## accommodates 0.143** (0.065)
## property_typeBoat 0.950 (1.686)
## property_typeCamper/RV 11.829** (4.644)
## property_typeCasa particular -2.292 (3.439)
## property_typeEntire bungalow -0.026 (2.376)
## property_typeEntire cabin -0.976 (2.116)
## property_typeEntire chalet -0.006 (2.014)
## property_typeEntire condo 0.189 (1.646)
## property_typeEntire condominium (condo) 0.787 (1.676)
## property_typeEntire cottage -0.836 (1.957)
## property_typeEntire guest suite -0.319 (1.722)
## property_typeEntire guesthouse 0.751 (1.725)
## property_typeEntire home -0.340 (1.645)
## property_typeEntire loft -0.505 (1.662)
## property_typeEntire place 1.189 (2.040)
## property_typeEntire rental unit 0.343 (1.638)
## property_typeEntire residential home 1.311 (1.670)
## property_typeEntire serviced apartment 1.393 (1.666)
## property_typeEntire townhouse -0.167 (1.660)
## property_typeEntire vacation home -0.772 (2.045)
## property_typeEntire villa 0.656 (1.821)
## property_typeFarm stay 2.317 (2.126)
## property_typeHouseboat 0.852 (1.668)
## property_typePrivate room -2.289 (2.379)
## property_typePrivate room in barn 5.811 (4.785)
## property_typePrivate room in bed and breakfast -3.032 (2.189)
## property_typePrivate room in boat -0.512 (2.217)
## property_typePrivate room in bungalow -3.390 (3.294)
## property_typePrivate room in cabin -1.973 (2.891)
## property_typePrivate room in casa particular -3.889 (2.407)
## property_typePrivate room in condo -4.900** (2.223)
## property_typePrivate room in condominium (condo) -1.510 (2.273)
## property_typePrivate room in earthen home -5.854 (3.718)
## property_typePrivate room in farm stay -2.766 (2.381)
## property_typePrivate room in guest suite -2.732 (2.206)
## property_typePrivate room in guesthouse -2.481 (2.395)
## property_typePrivate room in home -3.922* (2.203)
## property_typePrivate room in hostel -0.920 (2.373)
## property_typePrivate room in houseboat -2.320 (2.202)
## property_typePrivate room in loft -2.628 (2.234)
## property_typePrivate room in nature lodge -8.830** (3.740)
## property_typePrivate room in rental unit -2.734 (2.190)
## property_typePrivate room in residential home -1.795 (2.216)
## property_typePrivate room in serviced apartment 1.015 (2.348)
## property_typePrivate room in tiny home 2.121 (2.884)
## property_typePrivate room in tiny house -5.690* (3.036)
## property_typePrivate room in townhouse -3.488 (2.201)
## property_typePrivate room in villa -0.615 (2.497)
## property_typeRoom in aparthotel 0.376 (1.913)
## property_typeRoom in bed and breakfast -0.911 (2.426)
## property_typeRoom in boutique hotel -0.951 (2.204)
## property_typeRoom in hostel -3.060 (2.504)
## property_typeRoom in hotel -1.522 (2.212)
## property_typeRoom in serviced apartment -3.315 (2.466)
## property_typeShared room in bed and breakfast 3.997 (4.016)
## property_typeShared room in home -0.935 (3.887)
## property_typeShared room in hostel 7.569*** (2.662)
## property_typeShared room in houseboat 4.966* (2.959)
## property_typeShared room in rental unit 6.670** (2.820)
## property_typeShared room in residential home
## property_typeTent 2.520 (3.715)
## property_typeTiny home -2.566 (2.020)
## property_typeTiny house -5.740* (3.421)
## property_typeTower 1.486 (2.305)
## property_typeWindmill 0.392 (3.462)
## property_typeYurt 0.902 (4.577)
## bathrooms_text0 baths -1.120 (1.454)
## bathrooms_text0 shared baths -3.463** (1.503)
## bathrooms_text1 bath 0.545 (1.161)
## bathrooms_text1 private bath 0.509 (1.150)
## bathrooms_text1 shared bath 0.152 (1.166)
## bathrooms_text1.5 baths 0.491 (1.160)
## bathrooms_text1.5 shared baths 0.775 (1.172)
## bathrooms_text2 baths 0.092 (1.176)
## bathrooms_text2 shared baths -1.676 (1.554)
## bathrooms_text2.5 baths 0.143 (1.202)
## bathrooms_text2.5 shared baths 5.418 (4.616)
## bathrooms_text3 baths 0.208 (1.238)
## bathrooms_text3 shared baths 3.005 (1.925)
## bathrooms_text3.5 baths 1.183 (1.415)
## bathrooms_text3.5 shared baths -1.051 (2.289)
## bathrooms_text4 baths 0.538 (2.136)
## bathrooms_text4.5 baths -2.103 (2.113)
## bathrooms_text5 baths 2.097 (1.845)
## bathrooms_text5.5 baths -0.207 (2.831)
## bathrooms_textHalf-bath 0.048 (1.814)
## bathrooms_textPrivate half-bath 1.479 (1.710)
## bathrooms_textShared half-bath -1.026 (1.383)
## minimum_nights 0.001 (0.002)
## maximum_nights 0.0003*** (0.0001)
## price 0.001 (0.001)
## number_of_reviews -0.003*** (0.001)
## review_scores_rating -0.123 (0.158)
## reviews_per_month 0.093*** (0.024)
## host_has_profile_pict -0.506 (0.738)
## host_identity_verifiedt -0.251* (0.133)
## beds 0.101* (0.058)
## bedrooms -0.321*** (0.097)
## landmark_nn3 -80.707 (84.248)
## landmark_nn4 108.208 (81.847)
## trans_nn3 137.439* (74.944)
## trans_nn5 -174.220** (80.500)
## wifi 0.020 (0.159)
## tv -0.134 (0.124)
## private 0.111 (0.111)
## dryer 0.071 (0.184)
## quite 0.360 (0.269)
## clean 0.085 (0.155)
## availability_90 0.244*** (0.002)
## Constant -1.074 (2.389)
## ----------------------------------------------------------------------------------------
## Observations 8,866
## R2 0.729
## Adjusted R2 0.724
## Residual Std. Error 4.250 (df = 8729)
## F Statistic 172.286*** (df = 136; 8729)
## ========================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
test_set_result<-
listing_test_new %>%
mutate(Regression = "baseline Regression",
price.Predict = predict(reg.2, listing_test_new),
price.Error = price.Predict - price,
price.AbsError = abs(price.Predict - price),
price.APE = (abs(price.Predict - price)) / price.Predict)%>%
mutate(Regression = "demand Regression",
demand.Predict = predict(reg_demand1, listing_test_new),
demand.Error = demand.Predict - availability_30,
demand.AbsError = abs(demand.Predict - availability_30),
demand.APE = (abs(demand.Predict - availability_30)) / demand.Predict)
test_set_result<- test_set_result%>%
dplyr::select(-bathrooms, -neighbourhood_group_cleansed,-calendar_updated)
test_set_result <- test_set_result%>%
na.omit()
MAE = mean(listing_test$price.AbsError, na.rm = T)
MAPE = mean(listing_test$price.APE, na.rm = T)
Revenue_ <- test_set_result$price.Predict * test_set_result$demand.Predict
Airbnb_Listing <- test_set_result$name
Location <- test_set_result$host_neighbourhood
benefit_an <- data.frame(Airbnb_Listing,Location,Revenue_)%>%
filter(Revenue_ > 0)
benefit_an <- benefit_an %>% mutate_all(na_if,"")%>%
na.omit()
benefit_an1 <- benefit_an %>%
group_by(Location)%>%
summarise(mean_Revenue = mean(Revenue_))
We computed the average Airbnb revenue in each district by multiplying the predict price column with the predicted arability column, subtract the 3% app service fee. Finally, we put this column side-by-side with the average long term rent price to see the difference in revenue.
datas<- benefit_an[sample(nrow(benefit_an),10),]
District <- c("Centrum", "West", 'Nieuw-West', 'Zuid', "Oost", "Noord", "Zuidoost")
Observed_Long_term_Rent <- c(538, 529,560,560,566,574,540)
Predicted_Airbnb_Revenue <- (c(1732.6, 829.7, 479.1, 785.5, 677.7, 639.5, 785.5))*0.97
benefit_an2 <- data.frame(District ,Observed_Long_term_Rent,Predicted_Airbnb_Revenue)%>%
mutate(Comparison = Predicted_Airbnb_Revenue-Observed_Long_term_Rent )
library("kableExtra")
benefit_an2 %>%
kbl(caption = "Predicted Airbnb Revenue Compared with Observed Long-term Housing Price in Each District")%>%
kable_minimal()
| District | Observed_Long_term_Rent | Predicted_Airbnb_Revenue | Comparison |
|---|---|---|---|
| Centrum | 538 | 1680.622 | 1142.622 |
| West | 529 | 804.809 | 275.809 |
| Nieuw-West | 560 | 464.727 | -95.273 |
| Zuid | 560 | 761.935 | 201.935 |
| Oost | 566 | 657.369 | 91.369 |
| Noord | 574 | 620.315 | 46.315 |
| Zuidoost | 540 | 761.935 | 221.935 |
The column “Comparison” is calculated by “Predicted_Airbnb_Revenue” minus “observed_Long_term_Rent”. predicted Airbnb revenue is much higher than the observed long-term rental market price in almost all of the districts, except for the West district. The predicted Airbnb revenue in the “Centrum” district is much higher than other districts, but turns out to be the second lowest price in the long-term rental market. It is an interesting discovery. We think it may be because the target customer groups of long-term and short-term rentals are different. One is residents who need to live in the local city for a while, another one is short stay travelers.
ggplot() +
geom_sf(data = neighbourhoods_geo) +
geom_sf(data = test_set_result, aes(colour = q5(price.AbsError), na.rm = TRUE),
show.legend = "point", size = 1.25) +
scale_colour_manual(values = palette5blue,
labels=qBr(test_set_result,"price.AbsError"),
name="Absolute Residual") +
labs(title="Absolute Residual of Prediction", caption = 'Map7-1') +
mapTheme()#+
The absolute residual map showed that our price regression model tended to produce more residual in central areas.
coords.test <- st_coordinates(test_set_result)
neighborList.test <- knn2nb(knearneigh(coords.test, 5))
spatialWeights.test <- nb2listw(neighborList.test, style="W")
test_set_result <- test_set_result %>%
mutate(lagPriceError = lag.listw(spatialWeights.test, price.Error, na.rm = T))
test_set_result <- test_set_result %>%
mutate(lagPrice = lag.listw(spatialWeights.test, price, na.rm = T))
ggplot(test_set_result, aes(x=lagPriceError, y=price)) + geom_point()+
geom_smooth(method=lm, color='red')+
labs( title = 'Error as a Function of the Spatial Lag of Airbnb Price',caption = 'Figure7-1' , subtitle = sprintf('correlation = %s',round(cor(test_set_result$lagPriceError, test_set_result$price), 2)))+
plotTheme()
The lag error and price plot can be interpreted that as the Airbnb price errors increase, nearby Airbnb price errors decrease a little bit. However, the correlation is relative weak we can conclude whether the listings are spatially autocorrelated. further analysis using Moran’s I would help to draw a conclusion.
ggplot(test_set_result, aes(x=lagPrice, y=price)) + geom_point()+
geom_smooth(method=lm, color='red')+
labs( title = 'Price as a Function of the Spatial Lag of Airbnb Price \n spatial lag of price (mean price of 5 nearesrt neighbors)',caption = 'Figure7-2' , subtitle = sprintf('correlation = %s',round(cor(test_set_result$lagPrice, test_set_result$price), 2)))+
plotTheme()
In the price and lag price scatterplot, we observed that as the price increases, the price of nearby Airbnb listings increases as well. The correlation between the spatial lag price and the price is 0.39, and it doesn’t seem to be a significant correlation. Thus, there is no substantial evidence for the clustering of Airbnb prices.
#moran's I
moranTest <- moran.mc(test_set_result$ price.Error,
spatialWeights.test, nsim = 999)
ggplot(as.data.frame(moranTest$res[c(1:999)]), aes(moranTest$res[c(1:999)])) +
geom_histogram(binwidth = 0.01) +
geom_vline(aes(xintercept = moranTest$statistic), colour = "#FA7800",size=1) +
scale_x_continuous(limits = c(-1, 1)) +
labs(title="Observed and permuted Moran's I",
subtitle= "Observed Moran's I in orange",
x="Moran's I",
y="Count",
caption = 'Figure 7-3') +
plotTheme()
In
the Moran’s I plot, we didn’t see a large Moran’s I value shown in
Orange. Based on this result, we can conclude that there is no
significant spatial autocorrelation in our price regression.
ggplot(test_set_result, aes(x=price.Predict, y=price)) +
geom_point()+
geom_smooth(method=lm, colour="red") +
geom_abline(color = '#BFC3D6', size = 1)+
labs( title = 'Price as a Function of Predicted Price',subtitle = sprintf('correlation = %s',round(cor(test_set_result$price.Predict, test_set_result$price), 2)), caption = 'Figure 7-4')+
plotTheme()
In the predicted price as a function of observed price plot, our regression appeared to be performing well for listings that have rates below $300-$350 per night. The variance start to increase for more expensive listings, which indicates that there is still improvement for our model. The grey line shown in the graph represent the prefect correlation between the predicted price and observed price. The red line is the trend line between the two, and we conclude that our model has an acceptable accuracy based on that the two lines are very close to each other.
fitControl <- trainControl(method = "cv", number = 100)
set.seed(825)
reg.cv <-
train(price ~ ., data = as.data.frame(listing_training) %>%
dplyr::select( price, host_response_rate, host_acceptance_rate, host_listings_count, host_total_listings_count, neighbourhood_cleansed, room_type, accommodates,property_type, bathrooms_text, minimum_nights, maximum_nights, availability_30, number_of_reviews, review_scores_rating, reviews_per_month, host_has_profile_pic, host_identity_verified, beds,bedrooms,landmark_nn3,landmark_nn4,trans_nn3,trans_nn5,wifi,tv ,private, clean, quite),
method = "lm", trControl = fitControl, na.action = na.pass)
reg.cv
## Linear Regression
##
## 14704 samples
## 28 predictor
##
## No pre-processing
## Resampling: Cross-Validated (100 fold)
## Summary of sample sizes: 14558, 14557, 14556, 14556, 14557, 14556, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 88.29713 0.5024856 61.76172
##
## Tuning parameter 'intercept' was held constant at a value of TRUE
hist(reg.cv$resample[,3],
main="Cross-Validation MAE Histogram Chart",
xlab="Distribution of Mean Absolute Errors ",
sub = 'Figure 7-5',
col = 'skyblue3',
breaks =50)
The cross validation plot of MAE showed a normal distribution and cluster tightly, which is a clear indicator that our model has an acceptable generalizability.
#listings_prices <- airbnb_geo %>%
#dplyr::select(id, price)
library(gghighlight)
pricesERROR.in.neighbors <- #spatial join neighobrhood and listing informatoin
neighbourhoods_geo %>%
st_join(test_set_result, join = st_intersects)%>%
dplyr::select(-neighbourhood_group)
neighbors_mean_priceERROR <- pricesERROR.in.neighbors %>% #the mean price of eah Neighborhood
group_by(neighbourhood_cleansed)%>%
summarise(mean_MAPE_percent = mean(price.APE *100, na.rm = T))
ggplot() +
geom_sf(data=neighbors_mean_priceERROR, aes(fill = q5(mean_MAPE_percent)))+
scale_fill_manual(values = palette5blue,
labels = qBr(neighbors_mean_priceERROR, "mean_MAPE_percent"),
name = "mean MAPE% \n(Quintile Breaks)")+
labs(title = "Airbnb Model Mean MAPE in Each Neighborhood",caption = 'Figure 7-6')
Finally, the mean MAPE map shows that our model didn’t perform well in parts o Nieuw-West, Noord, and Zuidoost districts. We believed that this issue can be caused by spatial information being omitted from the regressions.
As for the result of mean short-term rental revenue of each district, it is obvious that the “Centrum” district has much higher revenue compared to the other districts. However, as for the result of long-term rental prices, each district has similar prices and “Centrum” district has the second lowest price, which contradicts short-term rental prices. We think this may reveal the difference in rent due to different needs of tenants. One might perceive that the central area of the city tends to have more convenient public transportation and denser service facilities, which are factors that short-term travelers care more about. Therefore, they are willing to pay higher prices in the center city. On the other hand, long-term renters are usually people who need to live in the local area for a period of time. For them, a quiet and beautiful area is more attractive than a noisy city center.
In general, if you have an apartment in the center of Amsterdam, it is definitely more profitable to put it on Airbnb as a short-term rental. Even in all districts excerpts “West”, Airbnb can bring higher revenue to hosts than the local long-term rental market.
listings_prices <- airbnb_geo %>%
dplyr::select(id, price)
prices.in.neighbors <- #spatial join neighobrhood and listing informatoin
neighbourhoods_geo %>%
st_join(listings_prices, join = st_intersects)%>%
dplyr::select(-neighbourhood_group)
neighbors_mean_price <- prices.in.neighbors %>% #the mean price of eah Neighborhood
group_by(neighbourhood)%>%
summarise(mean_price = mean(price))
ggplot() +
geom_sf(data=neighbors_mean_price, aes(fill = q5(mean_price)))+
scale_fill_manual(values = palette5blue,
labels = qBr(neighbors_mean_price, "mean_price"),
name = "mean_price\n(Quintile Breaks)") +
geom_sf(data=airbnb_geo, color = 'red', size = 0.5)+
labs(title = "Airbnb Listings in Each Neighborhood", subtitle = "Figure 7-6")
Overall, our predictions are based on the ideal assumption that hosts no longer have other expenses in addition to paying 3% Airbnb service fee, which means they can have 97% of listing prices as their income. But the reality is that short-term rental landlords often need to deal with additional costs such as damage to the house and its facilities.
Also, the lease of a certain apartment for a long-term lease only includes the rent of the house, and the tenant needs to pay additional fees such as water, electricity, network fees, etc., but the listings for short-term rentals will include these expenses in the rent , which will also result in higher short-term rental income than long-term rental.
We use a column called “available_30” from the original data set as the number of days not rented in the next 30 days of each apartment on Airbnb. Therefore, the number of days that have been rented out can be obtained by subtracting this column of data from 30. For example, if an apartment has a “available_30” value of 7, we will consider it rented out for 30-7=23 of the next 30 days. But, some apartments have “available_30” of x days because they only open for booking for the next x days. So we may be overestimating the number of days that an apartment can be rented on Airbnb.
After predicting the prices of the apartments and the number of rental days within 30 days, we multiply the prices and the number of rental days to get the hosts’ short-term rent income in 30 days from Airbnb. And then, calculate short-term prices of each district and compare them with the long-term rental prices of the local rental market. But even in the same district, the prices of different apartments may vary greatly. Which makes our result less useful to those hosts who want to compare prices in a more precise geographical area. We can provide a more precise comparison if we obtain the mean rent prices of the local long-term rental market of a smaller area.